Initial Comparison of Linguistic Networks Measures for Parallel Texts
نویسندگان
چکیده
In this paper we compared the properties of linguistic networks for Croatian, English and Italian languages. We constructed co-occurrence networks from parallel text corpora, consisting of the translations of five books in the three languages. We generated an Erdös-Rényi random graph with the same number of nodes and links, which enabled the comparison with linguistic co-occurrence networks, showing small-world properties. Furthermore, the comparison of Croatian, English and Italian linguistic networks showed that, besides expected commonalities of networks, there are also certain differences. The networks’ measures across the three studied languages differ particularly in the average path length. The results indicate that size of the corpus and anomalies in text affect the network structure.
منابع مشابه
Neural Networks in Spontaneous Speech Assessment of Dysphasic Patients
Neural networks can be successfully used for the classification of dysphasic subjects based on their conversational speech using a set of linguistic measures. I shall illustrate the approach with particular reference to its application in classifying agrammatic patients. Linguistic measures can be applied to the transcribed texts of conversational speech of both normal and agrammatic subjects a...
متن کاملParallel texts: Using translational equivalents in linguistic typology
Parallel texts are texts in different languages that can be considered translational equivalent. We introduce the notion ‘massively parallel text’ for such texts that have translations into very many languages. In this introduction we discuss some massively parallel texts that might be used for the investigation of linguistic diversity. Further, a short summary of the articles in this issue is ...
متن کاملQuantitative Classification of Conversational Language Using Artificial Neural Networks
In this paper I shall describe the use of artificial neural networks for the classification of subjects based on their conversational speech using a set of linguistic measures with particular reference to the application of this approach in classifying dysphasic patients. These linguistic measures can be applied to the transcribed texts of conversational speech of both normal and dysphasic subj...
متن کاملExtracting Recurrent Phrases and Terms from Texts Using a Purely Statistical Method
Most statistical measures for extracting interesting word pairs such as MI and t-score require a large corpus to work well. This paper evaluates some of the most widely used statistical measures and introduces a method that can identify significant bigrams in relatively small texts by adapting Fung and Church's (1994) K-vec algorithm, which was originally designed to extract word correspondence...
متن کاملA connective differentiation of textual production in interaction networks
This paper explores textual production in interaction networks, with special emphasis on its relation to topological measures. Four email lists were selected, in which measures were taken from the texts participants wrote. Peripheral, intermediary and hub sectors of these networks were observed to have discrepant linguistic elaborations. For completeness of exposition, correlation of textual an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1405.1893 شماره
صفحات -
تاریخ انتشار 2014